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class  of  problems  in  an  efficient  way,  we  make  use  of  surrogates  based  on  CPU  times  of 
previously  evaluated  points,  rather  than  their  function  values,  all  within  the  SEARCH 
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evaluation  to  allow  its  termination  during  the  comparison  process  if  the  computational 
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1  Introduction 

In  this  paper,  we  introduce  a  new  class  of  optimization  problems  and  a  novel  approach 
for  solving  them.  This  class  consists  of  minimizing  an  objective  function  f  :  X  C 
R”  — >  R  that  is  computationally  expensive  to  evaluate,  but  becomes  significantly  less 
so  as  the  solution  is  approached.  Typically,  the  objective  function  involves  some  type 
of  realistic  engineering  simulation,  which  in  practice  can  exceed  days  or  weeks  of  wall 
clock  time  on  even  the  largest  parallel  systems,  making  it  infeasible  to  naively  perform 
full  simulation  runs  on  a  large  set  of  problem  instances.  While  tackling  the  extreme 
computational  requirements  of  these  large  simulations  is  our  ultimate  goal,  our  primary 
motivation  in  this  paper  comes  from  an  application  that  involves  both  a  fluid  dynamics 
simulation  and  an  image  registration  process,  the  latter  of  which  becomes  much  less 
expensive  close  to  the  solution. 

This  unique  property  of  correlation  between  objective  function  values  and  the  com¬ 
putational  time  required  to  compute  them  leads  us  to  a  solution  approach  that  inte¬ 
grates  CPU  runtime  measures  into  the  optimization  process,  allowing  us  to  better 
utilize  computational  resources  and  significantly  reduce  computational  time.  Because 
of  the  computational  expense,  our  approach  also  involves  the  iterative  use  of  surrogates. 
In  this  context,  a  surrogate  can  be  thought  of  as  a  much  less  expensive  replacement 
for,  but  not  necessarily  a  good  approximation  to,  the  objective  function. 

To  maintain  rigorous  convergence  properties  of  the  approach,  the  use  of  surrogates 
is  incorporated  into  the  SEARCH  step  of  the  class  of  mesh  adaptive  direct  search  (MADS) 
algorithms.  MADS  was  introduced  by  Audet  and  Dennis  [6]  as  a  way  of  extending  gen¬ 
eralized  pattern  search  (GPS)  [4,11,12,23]  to  optimization  problems  with  nonlinear 
constraints  without  the  use  of  penalty  parameters  [13]  or  filters  [5].  Like  GPS,  each 
iteration  of  MADS  consists  of  a  search  and  a  poll  step.  The  search  step,  which  con¬ 
sists  of  evaluating  the  objective  function  at  any  finite  number  of  points,  allows  for  the 
use  of  surrogates  to  handle  computationally  expensive  functions.  The  poll  step,  which 
consists  of  evaluating  adjacent  points  on  a  mesh  formed  by  directions  that  positively 
span  Rn,  drives  the  convergence  theory  of  the  algorithm.  Since  our  solution  approach 
focuses  only  on  a  carefully  designed  SEARCH  step  within  the  MADS  framework,  and 
without  disturbing  the  algorithm  or  its  theoretical  convergence  properties,  we  restrict 
our  discussion  only  to  the  SEARCH  step.  Further  details  on  the  MADS  algorithm  class 
and  its  convergence  properties  can  be  found  in  [6]  or  [2]. 

The  remainder  of  the  paper  is  as  follows.  In  Section  2,  we  further  motivate  our 
work  by  discussing  the  details  of  our  application.  In  3,  we  discuss  surrogates  in  more 
detail,  including  some  specific  types  and  initialization  strategies.  In  Section  4,  we  in¬ 
troduce  new  strategies  for  incorporating  surrogates  to  efficiently  solve  our  target  class 
of  problems.  Numerical  results  on  a  specific  instance  of  our  application  are  given  in 
Section  5,  followed  by  some  concluding  remarks  in  Section  6. 
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2  An  Applicable  Class  of  Optimization  Problems 


The  class  of  problems  we  target  is  motivated  by  an  application  in  which  each  ob¬ 
jective  function  evaluation  requires  both  a  fluid  dynamics  simulation  and  an  image 
registration.  The  movement  of  fluids  in  a  region  ft  C  Rn,  n  £  {2,  3}  is  governed  by  the 
well-known  Navier-Stokes  equations: 


+  (v  '  v)w  +  Vp  =  An  +  (1  -  f3T)g, 

div  u  =  0, 


where  u  is  a  velocity  field  on  Rn,  p  is  the  pressure  field  in  12,  g  indicates  body  forces 
in  12,  Re  £  R  is  the  Reynolds  number  of  the  flow,  Pr  £  R  is  Prandtl  number  of  the 
flow,  /3  £  R  is  the  coefficient  of  thermal  expansion,  q is  the  heat  source,  T  is  the 
temperature,  and  A  =  denotes  the  Laplace  operator. 

Our  test  example  is  that  of  the  well-known  lid-driven  cavity  problem  [9] ,  in  which 
an  initially  stationary  2d  fluid  in  a  rectangular  container  is  subject  to  forces  imposed 
by  the  top  boundary  (lid)  moving  at  a  uniform  horizontal  velocity.  Since  the  Navier- 
Stokes  equations  cannot  be  solved  analytically,  they  must  be  solved  numerically  using 
a  finite  element  method  and  associated  finite  differencing  scheme.  We  use  the  method 
of  [9].  Figure  1(a)  shows  a  snapshot  of  the  fluid  velocity  at  some  positive  time.  Figure 
1(b)  shows  the  corresponding  representation  of  the  heat  function  H,  which  we  will 
use  as  reference  data.  The  heat  function  defines  the  2d  heat  flux  <&q  =  V  x  H,  and  is 
analogous  to  the  hydrodynamic  stream  function  which  defines  the  2d  velocity. 


Velocity  Field,  v  Heat  Function,  H 


Fig.  1  Example  time  snapshots  of  a  lid-driven  cavity  simulation.  The  velocity  field  (a)  at 
some  time  t  >  0  shows  a  vortical  structure.  The  heat  function  (b)  at  time  t  serves  as  our 
example  reference  image. 


Image  registration  is  the  process  of  estimating  some  optimal  transformation  u* 
between  two  images.  Thus,  a  transformation  u  is  realized  as  a  path  through  the  space 
of  images.  A  particular  choice  of  u  will  depend  on  the  needs  of  the  application.  For 
example,  in  medical  imaging  it  is  desirable  to  compare  images  with  minimal  distortion, 
V  x  u.  Other  image  comparison  tasks  benefit  from  minimizing  the  work  required  to 
“move”  the  intensities  from  one  image  to  another.  Different  types  of  transformations 
are  described  in  [15].  If  we  consider  the  classical  inner  product  space  1/2(12)  of  squared 
Lebesgue-integrable  functions  with  its  standard  induced  norm,  a  transformation  of  an 
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image  T  is  given  by  Tu(x)  =  T(x  —  u(x)),  where  u(x)  is  the  displacement  of  the  point  x. 
The  objective  is  to  minimize  the  distance  between  a  reference  image  R  and  a  template 
image  T  through  an  optimal  warp  transformation  Tu  G  1,2(12),  as  defined  by  some 
distance  measurement  D ,  and  a  smoothing  or  regularizing  term  S.  This  problem  is 
given  by 


where  a  >  0,  and 


min  DIR,  Tu\  +  aS'lw], 

u 


D[R,  Tu]  =  <p(x,u(x)) 
S^u]  =  A[m](*), 


(1) 

(2) 

(3) 


for  a  force  measurement  <p  and  partial  differential  operator  A.  The  distance  measure 
defines  a  local  spatio-temporal  force  <f>  needed  to  “move”  T  toward  R ,  creating  the  image 
warp  transformation  Tu .  The  regularizing  term  is  added  to  the  objective  function  to 
differentiate  between  possible  transformations,  since  the  minimum  distance  may  not 
be  unique,  and  one  type  of  transformation  may  be  preferred  over  another.  Applying 
the  Euler-Lagrange  equations  to  (1) — (3)  yields  the  system  of  nonlinear  differential 
equations, 


A[u](x)  —  4>(x,  u(x))  =  0, 


from  which  the  iteration  scheme, 

A[uk+1]{x)  —  4>(x,  uk{x))  =  0,  (4) 

is  constructed.  In  particular,  we  chose  the  following,  consistent  with  [15]: 

1  i  n  r 

D[R,TU]  =  -||  Tu  -  R\\L2{n),  S[u]  =  g  E  /  i^fdx,  A[u]  =  A2u. 

Figure  2  shows  an  example  of  an  image  registration  of  two  different  simulated  flows 
of  heat  for  a  fluid.  The  top  left  picture  is  the  reference  image  R  for  a  specific  set  of 
(unknown)  parameter  values  (e.g.,  Reynolds  number  of  the  fluid).  For  a  different  set  of 
parameter  values,  the  simulation  is  run,  resulting  in  the  template  image  T  shown  in  the 
top  right  picture.  The  image  registration  is  then  applied  using  (4)  to  solve  ( 1)— (3) ,  the 
solution  of  which  is  an  optimal  warp  function  u*(x).  The  resulting  warped  template 
image  Tu  and  the  difference  between  R  and  Tu  are  shown  in  the  bottom  left  and  right 
images,  respectively. 

Given  pixel  points  {xi}^=i  C  R™,  where  N  is  the  number  of  pixels  in  the  image, 
the  objective  function  is  a  measure  of  dissimilarity  between  the  images;  namely, 

1  N 

d=\\u*(x)~u\\2,  u=—J2u*(xi)  (5) 

2  =  1 

The  subtraction  of  the  means  in  (5)  allows  for  a  zero  objective  function  value  for 
images  that  are  identical  except  for  translational  alignment  issues.  If  the  images  are 
very  similar,  the  numerical  image  registration  scheme  (4)  requires  only  a  few  iterations 
to  transform  T  into  R,  resulting  in  less  computational  time  and  a  small  distance  value. 
On  the  other  hand,  images  that  are  relatively  dissimilar  require  more  iterations  of  (4), 
which  would  increase  the  computational  time  and  distance  value.  Thus,  the  objective 
function  (distance)  value  and  its  associated  computational  time  are  expected  to  be 
strongly  correlated. 
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Reference,  R  Template,  T 


Fig.  2  Image  registration  example.  The  noisy  intensities  of  the  template  image  (b)  are  trans¬ 
formed  into  a  registered  image  (c)  intended  to  match  a  reference  image  (a).  The  optimal 
transformation  u*  is  determined  by  equation  1 .  The  residual  R  —  Tu  is  shown  in  (d) . 


3  Surrogate  Functions 

The  idea  for  surrogates  first  appeared  as  “approximation  concepts”  in  the  work  of 
Schmit  and  Miura  [10].  Booker  et  al.  [8]  characterize  a  class  of  problems  for  which 
surrogate  functions  would  be  an  appropriate  approach,  suggest  a  surrogate  compo¬ 
sition,  and  set  forth  a  general  Surrogate  Management  Framework  (SMF)  for  using 
surrogates  to  numerically  solve  optimization  problems.  Most  surrogates  are  one  of  two 
types:  simplified  physics  or  response-based.  A  simplified  physics  model,  also  known  as 
a  low-fidelity  model,  makes  certain  physical  assumptions  that  significantly  reduce  the 
computational  cost  by  eliminating  complex  equations  and  even  the  number  of  vari¬ 
ables.  Although  several  novel  approaches  exist  in  the  literature  for  treating  this  class 
of  surrogates  (e.g.,  see  [18]  or  [3]),  the  actual  construction  of  the  models  is  problem- 
dependent. 

To  handle  our  target  class  of  problems,  we  use  as  surrogates  a  class  of  response- 
based  Kriging  approximation  models  [19].  Given  a  set  of  known  data  points  C 

R”  and  their  function  values  or  deterministic  responses  ys  £  Rm  [i.e.,  [j/s]i  =  y{si),  i  = 
i,2, ,  m),  the  deterministic  function  y(z )  is  modeled  as  a  realization  of  a  stochastic 
process, 


p 

Y(z)  =  £ /**/,-(*)  +  Z(z )  =  /3T/>)  +  Z(z),  (6) 

j= 1 

where  Y(z)  is  the  sum  of  a  regression  model  with  coefficients  p  =  \j3\,  P2,  ■  ■  ■ ,  Pp]  £  Rp 
and  basis  functions  /  =  \fi,  f2i  ■  ■  ■ ,  fp],  and  a  random  variable  Z(z),  where  Z  :  R"  — > 
R,  with  mean  zero  and  covariance  V(w,z)  =  <t2R{9,w,z)  between  Z(w)  and  Z(z), 
a2  is  the  process  variance,  and  R(9,  w,  z)  is  the  correlation  function  of  w  and  2.  The 
parameter  9  £  R”  controls  the  shape  of  the  correlation  function. 
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Kriging  produces  an  approximate  function  value  at  an  unknown  point  2  €  R"  using 
weights  on  known  responses;  namely, 

y(z)  =  c(x)Tys,  (7) 

where  c(x)  £  Rm  is  a  vector  of  weights  obtained  by  minimizing  the  mean  squared 
error  (MSE)  between  the  true  model  (6)  and  the  approximate  model  (7),  while  forcing 
the  predictor  to  be  unbiased.  Details  for  computing  c(x)  are  given  in  [19]  or  [14],  for 
example. 

The  DACE  process  requires  specification  of  the  data  sites,  and  regression  and  cor¬ 
relation  functions.  For  constructing  an  initial  surrogate,  the  set  of  initial  data  sites  can 
be  chosen  via  experimental  design  [20]  or  by  sampling  a  set  of  “space-filling”  points, 
such  as  Latin  hypercube  designs  [21,22]  or  orthogonal  arrays  [17].  Although  experi¬ 
mental  designs  generally  require  more  function  evaluations,  we  chose  to  use  a  central 
composite  design  (CCD)  because  the  small  dimension  of  our  problem  allowed  us  to 
sample  enough  points  to  form  a  more  accurate  second-order  regression  model. 

The  choice  of  correlation  function  can  significantly  impact  the  performance  of  the 
algorithm.  A  common  choice  in  practice  [14]  is  the  Gaussian  function,  given  by 

n 

R{9 ,  a,  b)  =  Rj(9j ,  |dj|),  \dj\)  =  exp (—9jdj),  dj  =  aj  —  bj  (8) 

3= 1 

for  any  points  a,  &  €  R" .  In  constructing  the  surrogate,  the  values  for  9  =  (9\ ,  92 ,  •  •  • ,  9n ) , 
are  computed  as  the  solution  to  an  optimization  problem,  based  on  previously  evaluated 
points.  The  optimization  process  requires  the  computation  of  R(9,x,x)~1 .  However, 
this  is  difficult  numerically  because  R  can  become  ill-conditioned  as  points  cluster 
together  during  the  convergence  process  of  MADS  [7].  The  increase  in  the  condition 
number  of  R(9,  x,  x)  (which  can  be  estimated  and  monitored)  can  greatly  impact  the 
computed  value  of  9  or  prevent  the  calculation  of  the  regression  coefficients  altogether. 
If  the  condition  number  is  too  large,  then  the  SEARCH  step  is  skipped. 


4  New  Surrogate  Strategies 

In  this  section,  we  introduce  a  new  SEARCH  step  for  CPU  time-correlated  functions. 
First,  if  objective  function  values  and  CPU  times  are  positively  correlated,  then  im¬ 
provement  in  the  objective  function  should  not  be  expected  once  the  computational 
time  exceeds  a  certain  amount.  To  avoid  wasting  unnecessary  CPU  time,  we  introduce 
a  CPU  time  threshold  parameter  t >  0  to  allow  a  function  evaluation  to  be  aborted 
if  it  is  taking  too  long  to  perform.  A  value  of  =  00  means  that  the  function  is 
evaluated  normally  without  being  aborted. 

Using  this  notation  an  evaluation  of  /  can  now  be  represented  by  [2,  f]  =  f(y,  t^), 
where  t  is  the  time  needed  to  compute  the  function  value  2  at  a  trial  point  y  €  X  and 
for  a  specified  value  of  Once  the  time  for  computing  the  function  value  exceeds 
the  value  specified  by  evaluation  of  f(y)  is  aborted  without  returning  a  value  for 
2  (or  2  is  set  to  be  infinity  or  an  arbitrarily  large  number).  One  possible  approach  we 
considered  is  to  set  =  1 where  tk  is  the  recorded  CPU  time  for  the  current  best 
iterate  Xk- 

The  CPU  time  correlation  also  means  that  a  surrogate  based  on  either  the  objective 
function  values  or  CPU  times  would  probably  be  a  good  predictor  of  decrease  in  the 
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objective.  In  fact,  a  surrogate  on  the  CPU  time  has  the  added  advantage  that  it  always 
returns  a  value,  whereas,  the  objective  function  would  be  aborted  if  is  exceeded  at 
iteration  k.  We  denote  these  surrogates  on  objective  values  and  CPU  times  by  Fj~(x) 
and  Tk(x)  (at  iteration  k),  respectively,  and  we  consider  the  following  four  surrogate 
optimization  problems,  whose  solutions  we  would  expect  to  be  good  subsequent  trial 
points  for  the  true  objective  function: 


min  Fk(x), 
x£X 

(9) 

min  Tk(x), 

x£X 

(10) 

min  Fk(x),  s.t. 
xGX 

Tk{x)  <  t tut  +£, 

(11) 

min  Tk(x),  s.t. 
xGX 

—  %k  • 

(12) 

The  parameter  e  >  0  added  to  the  constraint  in  (11)  is  a  constant  offset  to  allow  for 
variability  in  computational  time. 

The  surrogate  optimization  problem  is  typically  solved  using  a  recursive  call  to 
MADS.  Constraints  in  (11)— (12)  are  treated  by  the  barrier  approach  of  only  allowing 
feasible  points  (or  setting  the  function  value  at  any  infeasible  trial  point  to  infinity). 

We  should  note  that  the  combination  of  MADS  with  a  barrier  and  the  use  of  the 
in  the  original  optimization  problem  causes  a  dilemma  when  using  surrogate  functions. 
Using  the  parameter  t™*  to  stop  unprofitable  function  evaluations  is  good  for  saving 
computational  time,  but  it  results  in  infinite  or  arbitrarily  large  function  values,  which 
cannot  be  used  to  construct  a  good  surrogate.  To  overcome  this,  we  simply  set  the 
function  value  to  the  largest  value  seen  thus  far  in  the  iteration  process,  whenever  the 
time  cutoff  threshold  is  exceeded.  Our  algorithm  can  be  summarized  as  simply  MADS 
with  the  specific  fcth  SEARCH  step  given  in  Figure  3. 


•  For  all  previously  evaluated  points  Xk .  construct  surrogate  functions  F/.  (x)  or  Tk  (x). 

•  Solve  a  surrogate  problem  (one  of  (9)— (12)),  yielding  a  set  of  trial  points  Sk- 

•  Evaluate  points  y  £  Sk  using  [z,t]  =  f(y-  )  until  an  improved  mesh  point  has 
been  found,  or  until  all  points  in  Sk  have  been  evaluated. 

•  If  z  <  Zk  for  some  y  £  Sk, 

Set  xk+ 1  =  y,  Zfc+i  =  z,  tfc+i  =  t,  and  update  t^. 

Set  k  =  k  +  1  and  repeat  SEARCH 

End 


Fig.  3  MADS  SEARCH  step  k  for  CPU  time-correlated  functions 


5  A  Numerical  Example 

We  now  present  a  numerical  example,  which  is  a  specific  example  of  the  class  of  prob¬ 
lems  described  in  the  Section  2.  The  problem  is  the  well-studied  lid-driven  cavity  prob¬ 
lem  [15].  For  a  given  two-dimensional  square  domain,  the  Navier-Stokes  equations 
describe  a  fluid  flow  with  a  horizontal  velocity  force  on  one  edge  (see  Figure  1).  The 
fluid  begins  at  rest,  and  as  time  starts,  a  constant  horizontal  velocity  is  asserted  along 
the  top  edge,  causing  a  circular  pattern  of  flow  to  appear  within  the  fluid  over  time. 


For  each  combination  of  Reynolds  number  and  simulation  length,  the  velocity  and 
viscosity  of  the  fluid  form  a  different  circular  heat  pattern  throughout  the  region.  At 
one  particular  Reynolds  number  and  simulation  length,  a  reference  image  of  the  heat 
pattern  is  captured  and  then  noise  is  added  to  the  image,  so  as  to  represent  what  one 
might  to  see  in  experimentally  obtained  physical  measurements.  As  stated  in  Section  2, 
the  goal  is  to  run  a  simulation  for  different  Reynolds  numbers  and  simulation  lengths, 
capture  the  template  image,  and  compare  the  template  and  reference  images  of  heat 
in  an  attempt  to  determine  the  original  Reynolds  number  and  simulation  length  set 
for  the  reference  image.  Figure  2  shown  earlier  actually  shows  reference  and  template 
images  for  this  very  problem. 

To  solve  the  problem,  the  MADS  algorithm  with  the  SEARCH  step  described  by  3 
was  run  using  two  MATLAB®  software  packages,  NOMADm  [1]  for  the  implementa¬ 
tion  of  MADS,  and  DACE  [16]  to  build  the  surrogates,  along  with  some  custom-built 
files  used  for  the  SEARCH  step.  The  surrogate  optimization  problem  is  solved  by  a 
recursive  call  to  the  NOMADm  optimizer  from  within  the  SEARCH  step. 

For  each  scenario,  different  variations  of  the  algorithm  are  applied  and  compared  to 
a  base  case.  The  base  case  implementation  uses  GPS  with  an  empty  SEARCH  step  (i.e., 
it  is  skipped),  a  single  initial  point  or  set  of  CCD  points,  and  t™*  =  oo  for  all  k.  This 
allows  for  a  full  evaluation  of  all  points  and  a  comparative  analysis  of  the  proposed 
algorithm.  The  other  cases  use  the  partial  and  full  implementation  of  the  SEARCH  step 
presented  in  Figure  3. 

We  first  made  some  preliminary  runs  using  a  strategy  of  setting  i  =  fj.  at  each 
iteration.  However,  these  runs  turned  out  to  be  unsuccessful  in  forming  good  surrogates 
( i.e .,  surrogates  that  routinely  found  good  trial  points  to  evaluate)  because  our  main 
assumption  of  CPU  time  correlation  turned  out  to  only  hold  locally.  If  the  template 
image  is  too  dissimilar  from  the  reference  image,  the  image  registration  process  actually 
terminates  prematurely  -  with  a  lower  CPU  time  and  a  much  worse  objective  function 
value.  Setting  =  2t^  at  each  iteration  seemed  to  rectify  the  situation.  We  also 
experienced  ill-conditioning  of  the  correlation  matrix  as  a  solution  is  approached,  which 
is  caused  by  trial  points  becoming  more  clustered  together.  This  was  remedied  by 
invoking  an  empty  SEARCH  (i.e.,  not  optimizing  the  surrogate)  whenever  the  matrix 
became  ill-conditioned.  This  is  not  unreasonable  in  this  case  because  the  CPU  time 
correlation  means  that  function  values  are  probably  much  less  expensive  at  this  point 
in  the  iteration  process,  and  the  use  of  surrogates  is  then  not  as  important. 

The  results  for  each  case  are  shown  in  Table  1,  where  the  column  headings  denote, 
respectively,  the  type  of  SEARCH  step  executed  (Search),  type  of  initial  points  used 
(*o),  final  solution  (**),  number  of  iterations  (niter),  number  of  function  evaluations 
(nFunc),  CPU  minutes  required  (CPU),  and  the  ratio  of  successful  to  total  surrogate 
search  steps  executed  (Successes).  Except  for  the  “None”  designation,  the  search 
types  refer  to  the  four  surrogate  strategies  shown  in  (9)-(12).  The  first  letter  indicates 
the  objective  function  ( F(x )  vs.  T(x)),  and  the  second  (if  present)  indicates  the  con¬ 
straint.  The  initial  point  types  consist  of  using  a  single  initial  point  in  the  geometric 
center  of  the  bound  constrained  feasible  region  (center),  a  randomly  chosen  initial  feasi¬ 
ble  point  (random),  or  central  composite  design  (CCD).  The  final  solution  is  expressed 
as  [Reynolds  number,  simulation  length  (seconds)].  The  first  three  runs  are  base  cases 
with  no  SEARCH  step  or  time  cutoff  parameter  used,  while  the  last  seven  cases  are  dif¬ 
ferent  variations  of  the  new  SEARCH  step,  the  first  three  being  similar  to  the  base  case 
(no  SEARCH  step)  except  for  the  use  of  the  time  cutoff  parameter  to  abort  expensive 
function  evaluations.  (The  random  point  is  the  same  random  initial  point  that  was 
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used  for  the  base  case.)  The  final  four  cases  employ  one  of  the  surrogate  optimization 
problems  (9) — (12) ,  as  just  described. 


Table  1  GPS:  Lid-Driven  Cavity  Results 


Full-Time  =  +oo): 


Search 

xo 

x * 

niter 

nFunc 

CPU 

Successes 

None 

center 

[134.13,  4.76] 

56 

123 

126.73 

None 

random 

[134.12,  4.76] 

58 

118 

178.31 

None 

CCD 

[134.13,  4.76] 

40 

107 

196.58 

Cut-Time  =  2  xtk): 


Search 

#0 

X* 

niter 

nFunc 

CPU 

Successes 

None 

center 

[134.13,  4.76] 

80 

157 

257.67 

None 

random 

[134.12,  4.76] 

92 

162 

796.40 

None 

CCD 

[134.13,  4.76] 

40 

107 

109.73 

F-T 

CCD 

[134.19,  4.76] 

73 

162 

135.78 

23/44 

F 

CCD 

[134.19,  4.76] 

72 

165 

182.89 

15/34 

T-F 

CCD 

[134.19,  4.76] 

52 

133 

74.00 

7/15 

T 

CCD 

[134.19,  4.76] 

49 

127 

74.48 

5/13 

All  runs  successfully  found  the  optimal  solution  at  essentially  the  same  parameter 
values  (with  /(**)  =  0.60  in  all  cases).  As  hoped  for,  the  CPU  time  was  significantly 
lower  for  the  two  time-based  surrogates  ((10)  and  (12))  than  all  the  other  cases,  despite 
costing  more  function  evaluations  than  many  other  cases,  including  all  three  bases  cases. 
This  also  indicates  that  for  this  class  of  problems,  the  number  of  function  evaluations 
is  not  a  good  measure  of  the  efficiency  of  the  different  implementations. 

The  results  with  no  SEARCH  illustrate  the  importance  of  finding  an  appropriate 
initial  point  to  set  up  the  time  cutoff  parameter.  With  only  one  initial  point,  this 
parameter  needs  several  iterations  to  build  enough  slack  to  allow  the  sequence  of  points 
to  overcome  the  local  nature  of  the  CPU  time  correlation.  The  extra  iterations  resulted 
in  a  different  path  to  a  solution,  which  required  significantly  more  CPU  time.  However, 
when  using  a  initial  CCD  (with  an  empty  SEARCH  step),  the  results  are  identical  except 
for  the  CPU  time.  In  this  case,  using  the  time  cutoff  parameter  saved  almost  90  CPU 
minutes. 

Figure  4  further  illustrates  the  performance  of  the  four  surrogate  strategies,  with 
each  color  representing  a  different  strategy,  and  each  shape  representing  the  source 
(search  or  poll  step)  of  the  improvement.  The  figure  shows  that  the  surrogates 
based  on  computational  time  achieve  lower  function  values  quicker  than  the  surrogates 
based  on  function  values.  Furthermore,  regardless  of  the  surrogate  objective,  the  use  of 
a  constraint  in  each  case  resulted  in  faster  initial  convergence  than  the  corresponding 
unconstrained  version. 


6  Concluding  Remarks 


This  paper  represents  a  first  attempt  at  numerically  solving  the  challenging  class  of 
optimization  problems  in  which  function  values  and  the  CPU  times  required  to  compute 
them  are  correlated.  Exploiting  knowledge  about  CPU  time  correlation  with  objective 
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♦  Poll  ■  Search 

♦  ■  F(x)  s.t.  T(x)  ♦  ■  F(x) 

♦  bT(x)  s.t.  F(x) 

T(x)  X  Initial  Point 

Fig.  4  Decreasing  Function  Value 


function  values  appears  to  be  a  useful  and  efficient  means  of  solving  this  class  of 
problems.  One  challenge  is  dealing  with  the  extent  to  which  the  CPU  time  correlation 
property  holds  in  practice,  which  may  not  be  fully  understood.  The  implementation 
of  the  time  cutoff  parameter  was  a  useful  way  to  reduce  the  time  required  to  find 
a  numerical  solution.  However,  while  it  can  be  used  to  stop  the  image  registration 
algorithm,  it  cannot  stop  the  numerical  simulation,  since  the  image  registration  requires 
the  image  obtained  from  the  full  simulation. 

Since  is  controlled  by  the  user,  one  potential  improvement  would  be  a  more 
systematic  approach  to  updating  it,  rather  than  the  trial-and-error  approach  used 
here.  Since  function  values  and  computational  times  are  stored  in  order  to  construct 
surrogates,  they  can  also  be  used  to  measure  the  correlation  between  the  two.  Higher 
values  of  t™4  can  be  assigned  whenever  the  correlation  is  low,  and  vice  versa. 

Not  using  the  surrogates  when  ill-conditioning  occurs  was  a  simple  tactic  that  made 
sense  for  the  particular  problem  we  solved.  However,  a  more  effective  means  to  combat 
this  problem  may  be  the  use  of  a  trust  region  (e.g.,  see  [3]),  both  to  constrain  the 
optimization  of  the  surrogate  and  to  screen  out  points  used  in  the  construction  of 
the  surrogate.  The  size  of  the  trust  region  could  be  based  on  the  frame  or  mesh  size 
parameter. 
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